{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5分钟学会Python爬取整个网站"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "爬取网站的步骤：\n",
    "1. 设定爬取目标\n",
    "   * 目标网站：我自己的博客，疯狂的蚂蚁 http://www.crazyant.net\n",
    "   * 目标数据：所有博客文章的 - 链接、标题、标签\n",
    "2. 分析目标网站\n",
    "   * 待爬取页面：http://www.crazyant.net/page/1  ~ http://www.crazyant.net/page/24\n",
    "   * 待爬取数据：HTML元素中的h2 class=entry-title下的超链接的标题和链接，标签列表\n",
    "3. 批量下载HTML\n",
    "   * 使用requests库实现下载，官网：https://2.python-requests.org//zh_CN/latest/user/quickstart.html\n",
    "4. 实现HTML解析，得到目标数据\n",
    "   * 使用BeautifulSoup库解析，官网：https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/\n",
    "5. 将结果数据存储\n",
    "   * 可以使用json.dumps把这个数据序列化存储"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "import pprint\n",
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1、下载所有的页面的HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def download_all_htmls():\n",
    "    \"\"\"\n",
    "    下载所有列表页面的HTML，用于后续的分析\n",
    "    \"\"\"\n",
    "    htmls = []\n",
    "    for idx in range(26):\n",
    "        url = f\"http://www.crazyant.net/page/{idx+1}\"\n",
    "        print(\"craw html:\", url)\n",
    "        r = requests.get(url)\n",
    "        if r.status_code != 200:\n",
    "            raise Exception(\"error\")\n",
    "        htmls.append(r.text)\n",
    "    return htmls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "craw html: http://www.crazyant.net/page/1\n",
      "craw html: http://www.crazyant.net/page/2\n",
      "craw html: http://www.crazyant.net/page/3\n",
      "craw html: http://www.crazyant.net/page/4\n",
      "craw html: http://www.crazyant.net/page/5\n",
      "craw html: http://www.crazyant.net/page/6\n",
      "craw html: http://www.crazyant.net/page/7\n",
      "craw html: http://www.crazyant.net/page/8\n",
      "craw html: http://www.crazyant.net/page/9\n",
      "craw html: http://www.crazyant.net/page/10\n",
      "craw html: http://www.crazyant.net/page/11\n",
      "craw html: http://www.crazyant.net/page/12\n",
      "craw html: http://www.crazyant.net/page/13\n",
      "craw html: http://www.crazyant.net/page/14\n",
      "craw html: http://www.crazyant.net/page/15\n",
      "craw html: http://www.crazyant.net/page/16\n",
      "craw html: http://www.crazyant.net/page/17\n",
      "craw html: http://www.crazyant.net/page/18\n",
      "craw html: http://www.crazyant.net/page/19\n",
      "craw html: http://www.crazyant.net/page/20\n",
      "craw html: http://www.crazyant.net/page/21\n",
      "craw html: http://www.crazyant.net/page/22\n",
      "craw html: http://www.crazyant.net/page/23\n",
      "craw html: http://www.crazyant.net/page/24\n"
     ]
    }
   ],
   "source": [
    "# 执行爬取\n",
    "htmls = download_all_htmls()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'<!DOCTYPE html><html lang=\"zh-CN\" class=\"no-js\"><head><meta charset=\"UTF-8\"><meta name=\"viewport\" content=\"width=device-width\"><link rel=\"profile\" href=\"http://gmpg.org/xfn/11\"><link rel=\"pingback\" href=\"http://www.crazyant.net/xmlrpc.php\"> <!--[if lt IE 9]> <script src=\"http://www.crazyant.net/wp-content/themes/twentyfifteen/js/html5.js\"></script> <![endif]--> <script>(function(html){html.className = html.className.replace(/\\\\bno-js\\\\b/,\\'js\\')})(document.documentElement);</script> <title>蚂蚁学Python &#8211; 生命不止，探索不息</title><link rel=\\'dns-prefetch\\' href=\\'//cdn.bibblio.org\\' /><link rel=\"alternate\" type=\"application/rss+xml\" title=\"蚂蚁学Python &raquo; Feed\" href=\"http://www.crazyant.net/feed\" /><link rel=\"alternate\" type=\"application/rss+xml\" title=\"蚂蚁学Python &raquo; 评论Feed\" href=\"http://www.crazyant.net/comments/feed\" /> <!-- managing ads with Advanced Ads – https://wpadvancedads.com/ --><script>advanced_ads_ready=function(){var fns=[],listener,doc=typeof document===\"object\"&&document,hack=doc&&doc.documentElement.doScroll,domContentLoaded=\"DOMContentLoaded\",loaded=doc&&(hack?/^loaded|^c/:/^loaded|^i|^c/).test(doc.readyState);if(!loaded&&doc){listener=function(){doc.removeEventListener(domContentLoaded,listener);window.removeEventListener(\"load\",listener);loaded=1;while(listener=fns.shift())listener()};doc.addEventListener(domContentLoaded,listener);window.addEventListener(\"load\",listener)}return function(fn){loaded?setTimeout(fn,0):fns.push(fn)}}();</script><link rel=\\'stylesheet\\' id=\\'wp-block-library-css\\'  href=\\'http://www.crazyant.net/wp-includes/css/dist/block-library/style.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'wp-block-library-theme-css\\'  href=\\'http://www.crazyant.net/wp-includes/css/dist/block-library/theme.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'bibblio_related_posts-css\\'  href=\\'http://www.crazyant.net/wp-content/plugins/bibblio-related-posts/public/css/bibblio_related_posts.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'bibblio-rcm-css-css\\'  href=\\'//cdn.bibblio.org/rcm/4.6/bib-related-content.css?ver=5.2.4\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'my-style-css\\'  href=\\'http://www.crazyant.net/wp-content/plugins/cardoza-3d-tag-cloud//public/css/my-style.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'toc-screen-css\\'  href=\\'http://www.crazyant.net/wp-content/plugins/table-of-contents-plus/screen.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'wordpress-popular-posts-css-css\\'  href=\\'http://www.crazyant.net/wp-content/plugins/wordpress-popular-posts/public/css/wordpress-popular-posts-css.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'genericons-css\\'  href=\\'http://www.crazyant.net/wp-content/themes/twentyfifteen/genericons/genericons.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'twentyfifteen-style-css\\'  href=\\'http://www.crazyant.net/wp-content/themes/twentyfifteen/twentyfifteen-style.min.css\\' type=\\'text/css\\' media=\\'all\\' /><link rel=\\'stylesheet\\' id=\\'twentyfifteen-block-style-css\\'  href=\\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-block-style.min.css\\' type=\\'text/css\\' media=\\'all\\' /> <!--[if lt IE 9]><link rel=\\'stylesheet\\' id=\\'twentyfifteen-ie-css\\'  href=\\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-ie.min.css\\' type=\\'text/css\\' media=\\'all\\' /> <![endif]--> <!--[if lt IE 8]><link rel=\\'stylesheet\\' id=\\'twentyfifteen-ie7-css\\'  href=\\'http://www.crazyant.net/wp-content/themes/twentyfifteen/css/twentyfifteen-ie7.min.css\\' type=\\'text/css\\' media=\\'all\\' /> <![endif]--><link rel=\\'stylesheet\\' id=\\'fancybox-css\\'  href=\\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/css/jquery.fancybox.min.css\\' type=\\'text/css\\' media=\\'screen\\' /> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-includes/js/jquery/jquery.js\\'></script> <script async type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-includes/js/jquery/jquery-migrate.min.js\\'></script> <script async type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/bibblio_related_posts.min.js\\'></script> <script async type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/cardoza-3d-tag-cloud/jquery.tagcanvas.min.js\\'></script> <script type=\\'text/javascript\\'>/* <![CDATA[ */\\nvar wpp_params = {\"sampling_active\":\"0\",\"sampling_rate\":\"100\",\"ajax_url\":\"http:\\\\/\\\\/www.crazyant.net\\\\/wp-json\\\\/wordpress-popular-posts\\\\/v1\\\\/popular-posts\\\\/\",\"ID\":\"\",\"token\":\"f1a38a8bc9\",\"debug\":\"\"};\\n/* ]]> */</script> <script async type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/wordpress-popular-posts/public/js/wpp-4.2.0.min.js\\'></script> <link rel=\\'https://api.w.org/\\' href=\\'http://www.crazyant.net/wp-json/\\' /><link rel=\"EditURI\" type=\"application/rsd+xml\" title=\"RSD\" href=\"http://www.crazyant.net/xmlrpc.php?rsd\" /><link rel=\"wlwmanifest\" type=\"application/wlwmanifest+xml\" href=\"http://www.crazyant.net/wp-includes/wlwmanifest.xml\" /><meta name=\"generator\" content=\"WordPress 5.2.4\" /> <script type=\"text/javascript\">$j = jQuery.noConflict();\\n\\t\\t$j(document).ready(function() {\\n\\t\\t\\tif(!$j(\\'#myCanvas\\').tagcanvas({\\n\\t\\t\\t\\ttextColour: \\'#333333\\',\\n\\t\\t\\t\\toutlineColour: \\'#ffffff\\',\\n\\t\\t\\t\\treverse: true,\\n\\t\\t\\t\\tdepth: 0.8,\\n\\t\\t\\t\\ttextFont: null,\\n\\t\\t\\t\\tweight: true,\\n\\t\\t\\t\\tmaxSpeed: 0.05\\n\\t\\t\\t},\\'tags\\')) {\\n\\t\\t\\t\\t$j(\\'#myCanvasContainer\\').hide();\\n\\t\\t\\t}\\n\\t\\t});</script> <script type=\\'text/javascript\\'>// <![CDATA[\\n    var ajaxUrl = \"http://www.crazyant.net/wp-admin/admin-ajax.php\";\\n    //]]></script> <style type=\"text/css\">.recentcomments a{display:inline !important;padding:0 !important;margin:0 !important;}</style> <script>var _hmt = _hmt || [];\\n(function() {\\n  var hm = document.createElement(\"script\");\\n  hm.src = \"https://hm.baidu.com/hm.js?4c9637db87f741d7588ff42a2a9c057d\";\\n  var s = document.getElementsByTagName(\"script\")[0]; \\n  s.parentNode.insertBefore(hm, s);\\n})();</script> </head><body class=\"home blog wp-embed-responsive\"><div id=\"page\" class=\"hfeed site\"> <a class=\"skip-link screen-reader-text\" href=\"#content\">跳至内容</a><div id=\"sidebar\" class=\"sidebar\"><header id=\"masthead\" class=\"site-header\" role=\"banner\"><div class=\"site-branding\"><h1 class=\"site-title\"><a href=\"http://www.crazyant.net/\" rel=\"home\">蚂蚁学Python</a></h1><p class=\"site-description\">生命不止，探索不息</p> <button class=\"secondary-toggle\">菜单和挂件</button></div><!-- .site-branding --></header><!-- .site-header --><div id=\"secondary\" class=\"secondary\"><nav id=\"site-navigation\" class=\"main-navigation\" role=\"navigation\"><div class=\"menu-%e5%af%bc%e8%88%aa%e6%a0%8f-container\"><ul id=\"menu-%e5%af%bc%e8%88%aa%e6%a0%8f\" class=\"nav-menu\"><li id=\"menu-item-862\" class=\"menu-item menu-item-type-custom menu-item-object-custom menu-item-862\"><a href=\"http://crazyant.net/\">首页</a></li><li id=\"menu-item-2482\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2482\"><a href=\"http://www.crazyant.net/category/python-solvedoubts\">Python-答疑解惑</a></li><li id=\"menu-item-2475\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2475\"><a href=\"http://www.crazyant.net/category/python-basic\">Python-基础知识</a></li><li id=\"menu-item-2527\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2527\"><a href=\"http://www.crazyant.net/category/python-%e7%88%ac%e8%99%ab%e7%b3%bb%e5%88%97\">Python-爬虫系列</a></li><li id=\"menu-item-2505\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2505\"><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\">Python-Pandas系列</a></li><li id=\"menu-item-2476\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2476\"><a href=\"http://www.crazyant.net/category/python-web\">Python-Web开发</a></li><li id=\"menu-item-2478\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2478\"><a href=\"http://www.crazyant.net/category/python-bigdata\">Python-大数据</a></li><li id=\"menu-item-2576\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2576\"><a href=\"http://www.crazyant.net/category/python-spark\">Python-Spark</a></li><li id=\"menu-item-2479\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2479\"><a href=\"http://www.crazyant.net/category/python-data-analysis\">Python-数据分析</a></li><li id=\"menu-item-2480\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2480\"><a href=\"http://www.crazyant.net/category/python-machinelearning\">Python-机器学习</a></li><li id=\"menu-item-2468\" class=\"menu-item menu-item-type-taxonomy menu-item-object-category menu-item-2468\"><a href=\"http://www.crazyant.net/category/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f\">推荐系统合集</a></li><li id=\"menu-item-2328\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-has-children menu-item-2328\"><a href=\"http://www.crazyant.net/%e5%85%b3%e4%ba%8e\">关于我</a><ul class=\"sub-menu\"><li id=\"menu-item-861\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-861\"><a href=\"http://www.crazyant.net/%e7%95%99%e8%a8%80%e5%b0%8f%e6%9c%ac\">留言小本</a></li><li id=\"menu-item-1844\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-1844\"><a href=\"http://www.crazyant.net/meditation-resource\">冥想资料</a></li><li id=\"menu-item-1941\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-1941\"><a href=\"http://www.crazyant.net/my_program_notes\">编程笔记</a></li><li id=\"menu-item-697\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-697\"><a href=\"http://www.crazyant.net/%e5%b8%b8%e7%94%a8%e8%b5%84%e6%ba%90\">常用资源</a></li><li id=\"menu-item-1875\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-1875\"><a href=\"http://www.crazyant.net/my_book_list\">个人书单</a></li><li id=\"menu-item-1866\" class=\"menu-item menu-item-type-post_type menu-item-object-page menu-item-1866\"><a href=\"http://www.crazyant.net/%e4%b8%aa%e4%ba%ba%e7%ae%b4%e8%a8%80\">个人箴言</a></li></ul></li></ul></div></nav><!-- .main-navigation --><div id=\"widget-area\" class=\"widget-area\" role=\"complementary\"><aside class=\"widget crazy-widget\"><h2 class=\"widget-title\">视频公众号：蚂蚁学Python</h2><a href=\"http://zhishi.iqiyi.com/shop/#/home/P812d55d7c6d344c7a17365a452bb9e52.html\"><img width=\"344\" height=\"344\" src=\\'http://www.crazyant.net/wp-content/uploads/2019/08/小图.jpg\\' alt=\\'\\'  /></a></aside><aside id=\"search-4\" class=\"widget widget_search\"><form role=\"search\" method=\"get\" class=\"search-form\" action=\"http://www.crazyant.net/\"> <label> <span class=\"screen-reader-text\">搜索：</span> <input type=\"search\" class=\"search-field\" placeholder=\"搜索&hellip;\" value=\"\" name=\"s\" /> </label> <input type=\"submit\" class=\"search-submit screen-reader-text\" value=\"搜索\" /></form></aside><aside id=\"recent-posts-9\" class=\"widget widget_recent_entries\"><h2 class=\"widget-title\">近期文章</h2><ul><li> <a href=\"http://www.crazyant.net/2585.html\">Tensorflow怎样接收变长列表特征</a></li><li> <a href=\"http://www.crazyant.net/2583.html\">Pandas实现数据的合并concat</a></li><li> <a href=\"http://www.crazyant.net/2574.html\">Pandas的Index索引有什么用途？</a></li><li> <a href=\"http://www.crazyant.net/2564.html\">机器学习常用数据集大全</a></li><li> <a href=\"http://www.crazyant.net/2561.html\">一个数据科学家的修炼路径</a></li></ul></aside><aside id=\"categories-13\" class=\"widget widget_categories\"><h2 class=\"widget-title\">分类目录</h2><form action=\"http://www.crazyant.net\" method=\"get\"><label class=\"screen-reader-text\" for=\"cat\">分类目录</label><select  name=\\'cat\\' id=\\'cat\\' class=\\'postform\\' ><option value=\\'-1\\'>选择分类目录</option><option class=\"level-0\" value=\"7\">c++</option><option class=\"level-0\" value=\"209\">flask</option><option class=\"level-0\" value=\"136\">hadoop</option><option class=\"level-0\" value=\"145\">hive</option><option class=\"level-0\" value=\"134\">java</option><option class=\"level-0\" value=\"36\">mysql</option><option class=\"level-0\" value=\"243\">pandas</option><option class=\"level-0\" value=\"8\">php</option><option class=\"level-0\" value=\"111\">python</option><option class=\"level-0\" value=\"255\">Python-Pandas系列</option><option class=\"level-0\" value=\"260\">Python-Tensorflow</option><option class=\"level-0\" value=\"251\">Python-机器学习</option><option class=\"level-0\" value=\"257\">Python-爬虫系列</option><option class=\"level-0\" value=\"253\">Python-答疑解惑</option><option class=\"level-0\" value=\"144\">shell</option><option class=\"level-0\" value=\"200\">spark</option><option class=\"level-0\" value=\"202\">tensorflow</option><option class=\"level-0\" value=\"151\">web</option><option class=\"level-0\" value=\"121\">wordpress</option><option class=\"level-0\" value=\"149\">个人旅程</option><option class=\"level-0\" value=\"150\">基础知识</option><option class=\"level-0\" value=\"131\">工具软件</option><option class=\"level-0\" value=\"203\">推荐系统</option><option class=\"level-0\" value=\"152\">操作系统</option><option class=\"level-0\" value=\"148\">数据采集</option><option class=\"level-0\" value=\"211\">数据驱动</option><option class=\"level-0\" value=\"86\">未分类</option><option class=\"level-0\" value=\"216\">机器学习</option><option class=\"level-0\" value=\"205\">程序人生</option><option class=\"level-0\" value=\"133\">站长</option> </select></form> <script type=\\'text/javascript\\'>/* <![CDATA[ */\\n(function() {\\n\\tvar dropdown = document.getElementById( \"cat\" );\\n\\tfunction onCatChange() {\\n\\t\\tif ( dropdown.options[ dropdown.selectedIndex ].value > 0 ) {\\n\\t\\t\\tdropdown.parentNode.submit();\\n\\t\\t}\\n\\t}\\n\\tdropdown.onchange = onCatChange;\\n})();\\n/* ]]> */</script> </aside><aside id=\"recent-comments-5\" class=\"widget widget_recent_comments\"><h2 class=\"widget-title\">近期评论</h2><ul id=\"recentcomments\"><li class=\"recentcomments\"><span class=\"comment-author-link\"><a href=\\'http://crazyant.net\\' rel=\\'external nofollow\\' class=\\'url\\'>crazyant</a></span>发表在《<a href=\"http://www.crazyant.net/2404.html#comment-28288\">听樊登的《非暴力沟通》</a>》</li><li class=\"recentcomments\"><span class=\"comment-author-link\"><a href=\\'http://blog.antior.cn\\' rel=\\'external nofollow\\' class=\\'url\\'>antior</a></span>发表在《<a href=\"http://www.crazyant.net/2404.html#comment-28287\">听樊登的《非暴力沟通》</a>》</li><li class=\"recentcomments\"><span class=\"comment-author-link\"><a href=\\'http://crazyant.net\\' rel=\\'external nofollow\\' class=\\'url\\'>crazyant</a></span>发表在《<a href=\"http://www.crazyant.net/my_book_list#comment-28278\">个人书单</a>》</li><li class=\"recentcomments\"><span class=\"comment-author-link\">d</span>发表在《<a href=\"http://www.crazyant.net/my_book_list#comment-28277\">个人书单</a>》</li><li class=\"recentcomments\"><span class=\"comment-author-link\">赖文伟</span>发表在《<a href=\"http://www.crazyant.net/2145.html#comment-28042\">快速找到Tomcat中最耗CPU的线程</a>》</li></ul></aside><aside id=\"tag_cloud-5\" class=\"widget widget_tag_cloud\"><h2 class=\"widget-title\">标签</h2><div class=\"tagcloud\"><ul class=\\'wp-tag-cloud\\' role=\\'list\\'><li><a href=\"http://www.crazyant.net/tag/apache\" class=\"tag-cloud-link tag-link-233 tag-link-position-1\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"apache (2个项目)\">apache</a></li><li><a href=\"http://www.crazyant.net/tag/c\" class=\"tag-cloud-link tag-link-69 tag-link-position-2\" style=\"font-size: 14.577181208054pt;\" aria-label=\"c++ (9个项目)\">c++</a></li><li><a href=\"http://www.crazyant.net/tag/django\" class=\"tag-cloud-link tag-link-118 tag-link-position-3\" style=\"font-size: 13.167785234899pt;\" aria-label=\"django (6个项目)\">django</a></li><li><a href=\"http://www.crazyant.net/tag/excel\" class=\"tag-cloud-link tag-link-230 tag-link-position-4\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"excel (2个项目)\">excel</a></li><li><a href=\"http://www.crazyant.net/tag/flask\" class=\"tag-cloud-link tag-link-210 tag-link-position-5\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"flask (2个项目)\">flask</a></li><li><a href=\"http://www.crazyant.net/tag/hadoop\" class=\"tag-cloud-link tag-link-173 tag-link-position-6\" style=\"font-size: 13.167785234899pt;\" aria-label=\"hadoop (6个项目)\">hadoop</a></li><li><a href=\"http://www.crazyant.net/tag/hive\" class=\"tag-cloud-link tag-link-175 tag-link-position-7\" style=\"font-size: 16.268456375839pt;\" aria-label=\"hive (14个项目)\">hive</a></li><li><a href=\"http://www.crazyant.net/tag/java\" class=\"tag-cloud-link tag-link-20 tag-link-position-8\" style=\"font-size: 18.335570469799pt;\" aria-label=\"java (24个项目)\">java</a></li><li><a href=\"http://www.crazyant.net/tag/javascript\" class=\"tag-cloud-link tag-link-21 tag-link-position-9\" style=\"font-size: 13.167785234899pt;\" aria-label=\"javascript (6个项目)\">javascript</a></li><li><a href=\"http://www.crazyant.net/tag/jquery\" class=\"tag-cloud-link tag-link-48 tag-link-position-10\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"jquery (2个项目)\">jquery</a></li><li><a href=\"http://www.crazyant.net/tag/jvm\" class=\"tag-cloud-link tag-link-166 tag-link-position-11\" style=\"font-size: 10.818791946309pt;\" aria-label=\"jvm (3个项目)\">jvm</a></li><li><a href=\"http://www.crazyant.net/tag/linux\" class=\"tag-cloud-link tag-link-59 tag-link-position-12\" style=\"font-size: 14.107382550336pt;\" aria-label=\"linux (8个项目)\">linux</a></li><li><a href=\"http://www.crazyant.net/tag/mac\" class=\"tag-cloud-link tag-link-186 tag-link-position-13\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"mac (2个项目)\">mac</a></li><li><a href=\"http://www.crazyant.net/tag/maven\" class=\"tag-cloud-link tag-link-222 tag-link-position-14\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"maven (2个项目)\">maven</a></li><li><a href=\"http://www.crazyant.net/tag/mybatis\" class=\"tag-cloud-link tag-link-187 tag-link-position-15\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"mybatis (2个项目)\">mybatis</a></li><li><a href=\"http://www.crazyant.net/tag/mysql\" class=\"tag-cloud-link tag-link-169 tag-link-position-16\" style=\"font-size: 18.61744966443pt;\" aria-label=\"mysql (26个项目)\">mysql</a></li><li><a href=\"http://www.crazyant.net/tag/pandas\" class=\"tag-cloud-link tag-link-244 tag-link-position-17\" style=\"font-size: 16.738255033557pt;\" aria-label=\"pandas (16个项目)\">pandas</a></li><li><a href=\"http://www.crazyant.net/tag/php\" class=\"tag-cloud-link tag-link-17 tag-link-position-18\" style=\"font-size: 21.248322147651pt;\" aria-label=\"php (50个项目)\">php</a></li><li><a href=\"http://www.crazyant.net/tag/python\" class=\"tag-cloud-link tag-link-170 tag-link-position-19\" style=\"font-size: 22pt;\" aria-label=\"python (60个项目)\">python</a></li><li><a href=\"http://www.crazyant.net/tag/qt\" class=\"tag-cloud-link tag-link-236 tag-link-position-20\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"qt (2个项目)\">qt</a></li><li><a href=\"http://www.crazyant.net/tag/redis\" class=\"tag-cloud-link tag-link-214 tag-link-position-21\" style=\"font-size: 10.818791946309pt;\" aria-label=\"redis (3个项目)\">redis</a></li><li><a href=\"http://www.crazyant.net/tag/seo\" class=\"tag-cloud-link tag-link-110 tag-link-position-22\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"seo (2个项目)\">seo</a></li><li><a href=\"http://www.crazyant.net/tag/shell\" class=\"tag-cloud-link tag-link-174 tag-link-position-23\" style=\"font-size: 13.637583892617pt;\" aria-label=\"shell (7个项目)\">shell</a></li><li><a href=\"http://www.crazyant.net/tag/spark\" class=\"tag-cloud-link tag-link-198 tag-link-position-24\" style=\"font-size: 11.758389261745pt;\" aria-label=\"spark (4个项目)\">spark</a></li><li><a href=\"http://www.crazyant.net/tag/svn\" class=\"tag-cloud-link tag-link-122 tag-link-position-25\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"svn (2个项目)\">svn</a></li><li><a href=\"http://www.crazyant.net/tag/tensorflow\" class=\"tag-cloud-link tag-link-199 tag-link-position-26\" style=\"font-size: 10.818791946309pt;\" aria-label=\"tensorflow (3个项目)\">tensorflow</a></li><li><a href=\"http://www.crazyant.net/tag/tomcat\" class=\"tag-cloud-link tag-link-213 tag-link-position-27\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"tomcat (2个项目)\">tomcat</a></li><li><a href=\"http://www.crazyant.net/tag/ubuntu\" class=\"tag-cloud-link tag-link-55 tag-link-position-28\" style=\"font-size: 13.167785234899pt;\" aria-label=\"ubuntu (6个项目)\">ubuntu</a></li><li><a href=\"http://www.crazyant.net/tag/vim\" class=\"tag-cloud-link tag-link-14 tag-link-position-29\" style=\"font-size: 8pt;\" aria-label=\"vim (1个项目)\">vim</a></li><li><a href=\"http://www.crazyant.net/tag/win7\" class=\"tag-cloud-link tag-link-226 tag-link-position-30\" style=\"font-size: 12.510067114094pt;\" aria-label=\"win7 (5个项目)\">win7</a></li><li><a href=\"http://www.crazyant.net/tag/word\" class=\"tag-cloud-link tag-link-229 tag-link-position-31\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"word (2个项目)\">word</a></li><li><a href=\"http://www.crazyant.net/tag/wordpress\" class=\"tag-cloud-link tag-link-171 tag-link-position-32\" style=\"font-size: 10.818791946309pt;\" aria-label=\"wordpress (3个项目)\">wordpress</a></li><li><a href=\"http://www.crazyant.net/tag/%e5%a4%a7%e6%95%b0%e6%8d%ae\" class=\"tag-cloud-link tag-link-207 tag-link-position-33\" style=\"font-size: 10.818791946309pt;\" aria-label=\"大数据 (3个项目)\">大数据</a></li><li><a href=\"http://www.crazyant.net/tag/%e5%ae%89%e5%85%a8\" class=\"tag-cloud-link tag-link-16 tag-link-position-34\" style=\"font-size: 8pt;\" aria-label=\"安全 (1个项目)\">安全</a></li><li><a href=\"http://www.crazyant.net/tag/%e6%8e%a8%e8%8d%90%e7%b3%bb%e7%bb%9f\" class=\"tag-cloud-link tag-link-204 tag-link-position-35\" style=\"font-size: 11.758389261745pt;\" aria-label=\"推荐系统 (4个项目)\">推荐系统</a></li><li><a href=\"http://www.crazyant.net/tag/%e6%93%8d%e4%bd%9c%e7%b3%bb%e7%bb%9f\" class=\"tag-cloud-link tag-link-238 tag-link-position-36\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"操作系统 (2个项目)\">操作系统</a></li><li><a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" class=\"tag-cloud-link tag-link-256 tag-link-position-37\" style=\"font-size: 15.610738255034pt;\" aria-label=\"数据分析 (12个项目)\">数据分析</a></li><li><a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%ba%93\" class=\"tag-cloud-link tag-link-23 tag-link-position-38\" style=\"font-size: 11.758389261745pt;\" aria-label=\"数据库 (4个项目)\">数据库</a></li><li><a href=\"http://www.crazyant.net/tag/%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0\" class=\"tag-cloud-link tag-link-215 tag-link-position-39\" style=\"font-size: 11.758389261745pt;\" aria-label=\"机器学习 (4个项目)\">机器学习</a></li><li><a href=\"http://www.crazyant.net/tag/%e7%88%ac%e8%99%ab\" class=\"tag-cloud-link tag-link-189 tag-link-position-40\" style=\"font-size: 15.328859060403pt;\" aria-label=\"爬虫 (11个项目)\">爬虫</a></li><li><a href=\"http://www.crazyant.net/tag/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f\" class=\"tag-cloud-link tag-link-206 tag-link-position-41\" style=\"font-size: 17.58389261745pt;\" aria-label=\"程序人生 (20个项目)\">程序人生</a></li><li><a href=\"http://www.crazyant.net/tag/website\" class=\"tag-cloud-link tag-link-172 tag-link-position-42\" style=\"font-size: 10.818791946309pt;\" aria-label=\"站长 (3个项目)\">站长</a></li><li><a href=\"http://www.crazyant.net/tag/%e7%ae%97%e6%b3%95\" class=\"tag-cloud-link tag-link-208 tag-link-position-43\" style=\"font-size: 10.818791946309pt;\" aria-label=\"算法 (3个项目)\">算法</a></li><li><a href=\"http://www.crazyant.net/tag/%e7%bb%87%e6%a2%a6\" class=\"tag-cloud-link tag-link-130 tag-link-position-44\" style=\"font-size: 11.758389261745pt;\" aria-label=\"织梦 (4个项目)\">织梦</a></li><li><a href=\"http://www.crazyant.net/tag/%e8%ae%be%e8%ae%a1\" class=\"tag-cloud-link tag-link-159 tag-link-position-45\" style=\"font-size: 9.6912751677852pt;\" aria-label=\"设计 (2个项目)\">设计</a></li></ul></div></aside><aside id=\"wpp-2\" class=\"widget popular-posts\"><h2 class=\"widget-title\">热门文章</h2><!-- cached --> <!-- WordPress Popular Posts --><ul class=\"wpp-list\"><li> <a href=\"http://www.crazyant.net/2525.html\" title=\"3分钟Python爬取9000张表情包图片\" class=\"wpp-post-title\" target=\"_self\">3分钟Python爬取9000张表情包图片</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">108 views</span></span></li><li> <a href=\"http://www.crazyant.net/2561.html\" title=\"一个数据科学家的修炼路径\" class=\"wpp-post-title\" target=\"_self\">一个数据科学家的修炼路径</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">44 views</span></span></li><li> <a href=\"http://www.crazyant.net/2515.html\" title=\"Pandas怎样按条件删除行？\" class=\"wpp-post-title\" target=\"_self\">Pandas怎样按条件删除行？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">42 views</span></span></li><li> <a href=\"http://www.crazyant.net/2574.html\" title=\"Pandas的Index索引有什么用途？\" class=\"wpp-post-title\" target=\"_self\">Pandas的Index索引有什么用途？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">39 views</span></span></li><li> <a href=\"http://www.crazyant.net/2541.html\" title=\"Pandas怎样处理字符串？\" class=\"wpp-post-title\" target=\"_self\">Pandas怎样处理字符串？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">33 views</span></span></li><li> <a href=\"http://www.crazyant.net/2564.html\" title=\"机器学习常用数据集大全\" class=\"wpp-post-title\" target=\"_self\">机器学习常用数据集大全</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">30 views</span></span></li><li> <a href=\"http://www.crazyant.net/2523.html\" title=\"Pandas系列 &#8211; 怎样新增数据列？\" class=\"wpp-post-title\" target=\"_self\">Pandas系列 &#8211; 怎样新增数据列？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">29 views</span></span></li><li> <a href=\"http://www.crazyant.net/2517.html\" title=\"Pandas怎样根据码表更新ID对应的名称？\" class=\"wpp-post-title\" target=\"_self\">Pandas怎样根据码表更新ID对应的名称？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">25 views</span></span></li><li> <a href=\"http://www.crazyant.net/2546.html\" title=\"Pandas的axis参数怎么理解？\" class=\"wpp-post-title\" target=\"_self\">Pandas的axis参数怎么理解？</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">23 views</span></span></li><li> <a href=\"http://www.crazyant.net/2521.html\" title=\"Pandas系列 &#8211; 数据统计函数\" class=\"wpp-post-title\" target=\"_self\">Pandas系列 &#8211; 数据统计函数</a> <span class=\"wpp-meta post-stats\"><span class=\"wpp-views\">22 views</span></span></li></ul></aside><aside id=\"text-11\" class=\"widget widget_text\"><h2 class=\"widget-title\">分享文章</h2><div class=\"textwidget\"><div class=\"bdsharebuttonbox\"><a href=\"#\" class=\"bds_more\" data-cmd=\"more\"></a><a href=\"#\" class=\"bds_qzone\" data-cmd=\"qzone\" title=\"分享到QQ空间\"></a><a href=\"#\" class=\"bds_tsina\" data-cmd=\"tsina\" title=\"分享到新浪微博\"></a><a href=\"#\" class=\"bds_tqq\" data-cmd=\"tqq\" title=\"分享到腾讯微博\"></a><a href=\"#\" class=\"bds_renren\" data-cmd=\"renren\" title=\"分享到人人网\"></a><a href=\"#\" class=\"bds_weixin\" data-cmd=\"weixin\" title=\"分享到微信\"></a></div> <script>window._bd_share_config={\"common\":{\"bdSnsKey\":{},\"bdText\":\"\",\"bdMini\":\"2\",\"bdPic\":\"\",\"bdStyle\":\"0\",\"bdSize\":\"16\"},\"share\":{}};with(document)0[(getElementsByTagName(\\'head\\')[0]||body).appendChild(createElement(\\'script\\')).src=\\'http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion=\\'+~(-new Date()/36e5)];</script></div></aside></div><!-- .widget-area --></div><!-- .secondary --></div><!-- .sidebar --><div id=\"content\" class=\"site-content\"><div id=\"primary\" class=\"content-area\"><main id=\"main\" class=\"site-main\" role=\"main\"><article id=\"post-2585\" class=\"post-2585 post type-post status-publish format-standard hentry category-python-tensorflow tag-python tag-tensorflow tag-259\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2585.html\" rel=\"bookmark\">Tensorflow怎样接收变长列表特征</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>比如用户的分类偏好、用户的历史观影行为，都是变长的元素列表，怎么输入到模型？ 这个问题很多人遇到，比如这些：  &hellip; <a href=\"http://www.crazyant.net/2585.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Tensorflow怎样接收变长列表特征</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2585.html\" rel=\"bookmark\"><time class=\"entry-date published\" datetime=\"2019-10-17T03:06:15+00:00\">2019-10-17</time><time class=\"updated\" datetime=\"2019-10-17T03:09:53+00:00\">2019-10-17</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-tensorflow\" rel=\"category tag\">Python-Tensorflow</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/tensorflow\" rel=\"tag\">tensorflow</a>、<a href=\"http://www.crazyant.net/tag/%e7%89%b9%e5%be%81%e5%b7%a5%e7%a8%8b\" rel=\"tag\">特征工程</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2585.html#respond\"><span class=\"screen-reader-text\">于Tensorflow怎样接收变长列表特征</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2585 --><article id=\"post-2583\" class=\"post-2583 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2583.html\" rel=\"bookmark\">Pandas实现数据的合并concat</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>使用场景： 批量合并相同格式的Excel、给DataFrame添加行、给DataFrame添加列 一句话说明c &hellip; <a href=\"http://www.crazyant.net/2583.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas实现数据的合并concat</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2583.html\" rel=\"bookmark\"><time class=\"entry-date published updated\" datetime=\"2019-10-16T16:08:43+00:00\">2019-10-16</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2583.html#respond\"><span class=\"screen-reader-text\">于Pandas实现数据的合并concat</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2583 --><article id=\"post-2574\" class=\"post-2574 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2574.html\" rel=\"bookmark\">Pandas的Index索引有什么用途？</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>把数据存储于普通的column列也能用于数据查询，那使用index有什么好处？ index的用途总结： 1.  &hellip; <a href=\"http://www.crazyant.net/2574.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas的Index索引有什么用途？</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2574.html\" rel=\"bookmark\"><time class=\"entry-date published updated\" datetime=\"2019-10-10T23:51:14+00:00\">2019-10-10</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2574.html#respond\"><span class=\"screen-reader-text\">于Pandas的Index索引有什么用途？</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2574 --><article id=\"post-2564\" class=\"post-2564 post type-post status-publish format-standard hentry category-python-machinelearning tag-python tag-215\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2564.html\" rel=\"bookmark\">机器学习常用数据集大全</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>UCI Machine Learning Adult Dataset Business Problem: Cl &hellip; <a href=\"http://www.crazyant.net/2564.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">机器学习常用数据集大全</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2564.html\" rel=\"bookmark\"><time class=\"entry-date published updated\" datetime=\"2019-10-10T10:51:40+00:00\">2019-10-10</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-machinelearning\" rel=\"category tag\">Python-机器学习</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0\" rel=\"tag\">机器学习</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2564.html#respond\"><span class=\"screen-reader-text\">于机器学习常用数据集大全</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2564 --><article id=\"post-2561\" class=\"post-2561 post type-post status-publish format-standard hentry category-205 tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2561.html\" rel=\"bookmark\">一个数据科学家的修炼路径</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>来自一个视频： 数据科学家的需求层次，从底层往上层依次需要： COLLECT，数据收集 MOVE/STORE， &hellip; <a href=\"http://www.crazyant.net/2561.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">一个数据科学家的修炼路径</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2561.html\" rel=\"bookmark\"><time class=\"entry-date published updated\" datetime=\"2019-10-01T13:31:50+00:00\">2019-10-01</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/%e7%a8%8b%e5%ba%8f%e4%ba%ba%e7%94%9f\" rel=\"category tag\">程序人生</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2561.html#respond\"><span class=\"screen-reader-text\">于一个数据科学家的修炼路径</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2561 --><article id=\"post-2546\" class=\"post-2546 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2546.html\" rel=\"bookmark\">Pandas的axis参数怎么理解？</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>axis参数非常的让人困惑难以理解，本视频我会用形象化的方式讲解一下这个参数，核心要诀就是axis那个轴会消失 &hellip; <a href=\"http://www.crazyant.net/2546.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas的axis参数怎么理解？</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2546.html\" rel=\"bookmark\"><time class=\"entry-date published\" datetime=\"2019-09-30T15:56:30+00:00\">2019-09-30</time><time class=\"updated\" datetime=\"2019-09-30T22:21:21+00:00\">2019-09-30</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2546.html#respond\"><span class=\"screen-reader-text\">于Pandas的axis参数怎么理解？</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2546 --><article id=\"post-2541\" class=\"post-2541 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2541.html\" rel=\"bookmark\">Pandas怎样处理字符串？</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>前面我们已经使用了字符串的处理函数： df[&#8220;bWendu&#8221;].str.replace &hellip; <a href=\"http://www.crazyant.net/2541.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas怎样处理字符串？</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2541.html\" rel=\"bookmark\"><time class=\"entry-date published\" datetime=\"2019-09-29T15:52:43+00:00\">2019-09-29</time><time class=\"updated\" datetime=\"2019-09-30T22:22:07+00:00\">2019-09-30</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2541.html#respond\"><span class=\"screen-reader-text\">于Pandas怎样处理字符串？</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2541 --><article id=\"post-2536\" class=\"post-2536 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2536.html\" rel=\"bookmark\">Pandas怎样对数据进行排序？</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>Series的排序： Series.sort_values(ascending=True, inplace=F &hellip; <a href=\"http://www.crazyant.net/2536.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas怎样对数据进行排序？</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2536.html\" rel=\"bookmark\"><time class=\"entry-date published\" datetime=\"2019-09-28T03:23:35+00:00\">2019-09-28</time><time class=\"updated\" datetime=\"2019-09-30T22:23:40+00:00\">2019-09-30</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2536.html#respond\"><span class=\"screen-reader-text\">于Pandas怎样对数据进行排序？</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2536 --><article id=\"post-2534\" class=\"post-2534 post type-post status-publish format-standard hentry category-python-solvedoubts tag-python tag-215\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2534.html\" rel=\"bookmark\">CTR预估：(标签-权重)列表类特征怎么输入到模型？</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>问题： 要做一个CTR预估模型； 通过之前的数据挖掘，我得到了用户对标签的偏好数据： [(&#8216;标签1 &hellip; <a href=\"http://www.crazyant.net/2534.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">CTR预估：(标签-权重)列表类特征怎么输入到模型？</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2534.html\" rel=\"bookmark\"><time class=\"entry-date published\" datetime=\"2019-09-27T12:54:09+00:00\">2019-09-27</time><time class=\"updated\" datetime=\"2019-10-13T16:16:42+00:00\">2019-10-13</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-solvedoubts\" rel=\"category tag\">Python-答疑解惑</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%9c%ba%e5%99%a8%e5%ad%a6%e4%b9%a0\" rel=\"tag\">机器学习</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2534.html#respond\"><span class=\"screen-reader-text\">于CTR预估：(标签-权重)列表类特征怎么输入到模型？</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2534 --><article id=\"post-2532\" class=\"post-2532 post type-post status-publish format-standard hentry category-python-pandas tag-pandas tag-python tag-256\"><header class=\"entry-header\"><h2 class=\"entry-title\"><a href=\"http://www.crazyant.net/2532.html\" rel=\"bookmark\">Pandas对缺失值的处理</a></h2></header><!-- .entry-header --><div class=\"entry-summary\"><p>Pandas使用这些函数处理缺失值： * isnull和notnull：检测是否是空值，可用于df和serie &hellip; <a href=\"http://www.crazyant.net/2532.html\" class=\"more-link\">继续阅读<span class=\"screen-reader-text\">Pandas对缺失值的处理</span></a></p></div><!-- .entry-summary --><footer class=\"entry-footer\"> <span class=\"posted-on\"><span class=\"screen-reader-text\">发布于 </span><a href=\"http://www.crazyant.net/2532.html\" rel=\"bookmark\"><time class=\"entry-date published updated\" datetime=\"2019-09-27T00:17:27+00:00\">2019-09-27</time></a></span><span class=\"byline\"><span class=\"author vcard\"><span class=\"screen-reader-text\">作者 </span><a class=\"url fn n\" href=\"http://www.crazyant.net/author/peishuai1987\">crazyant</a></span></span><span class=\"cat-links\"><span class=\"screen-reader-text\">分类 </span><a href=\"http://www.crazyant.net/category/python-pandas%e7%b3%bb%e5%88%97\" rel=\"category tag\">Python-Pandas系列</a></span><span class=\"tags-links\"><span class=\"screen-reader-text\">标签 </span><a href=\"http://www.crazyant.net/tag/pandas\" rel=\"tag\">pandas</a>、<a href=\"http://www.crazyant.net/tag/python\" rel=\"tag\">python</a>、<a href=\"http://www.crazyant.net/tag/%e6%95%b0%e6%8d%ae%e5%88%86%e6%9e%90\" rel=\"tag\">数据分析</a></span><span class=\"comments-link\"><a href=\"http://www.crazyant.net/2532.html#respond\"><span class=\"screen-reader-text\">于Pandas对缺失值的处理</span>留下评论</a></span></footer><!-- .entry-footer --></article><!-- #post-2532 --><nav class=\"navigation pagination\" role=\"navigation\"><h2 class=\"screen-reader-text\">文章导航</h2><div class=\"nav-links\"><span aria-current=\\'page\\' class=\\'page-numbers current\\'><span class=\"meta-nav screen-reader-text\">页 </span>1</span> <a class=\\'page-numbers\\' href=\\'http://www.crazyant.net/page/2\\'><span class=\"meta-nav screen-reader-text\">页 </span>2</a> <span class=\"page-numbers dots\">&hellip;</span> <a class=\\'page-numbers\\' href=\\'http://www.crazyant.net/page/26\\'><span class=\"meta-nav screen-reader-text\">页 </span>26</a> <a class=\"next page-numbers\" href=\"http://www.crazyant.net/page/2\">下一页</a></div></nav></main><!-- .site-main --></div><!-- .content-area --></div><!-- .site-content --><footer id=\"colophon\" class=\"site-footer\" role=\"contentinfo\"><div class=\"site-info\"> <a href=\"https://cn.wordpress.org/\" class=\"imprint\"> 自豪地采用WordPress </a></div><!-- .site-info --></footer><!-- .site-footer --></div><!-- .site --> <script type=\\'text/javascript\\' src=\\'//cdn.bibblio.org/rcm/4.6/bib-related-content.js?ver=5.2.4\\'></script> <script type=\\'text/javascript\\'>/* <![CDATA[ */\\nvar tocplus = {\"visibility_show\":\"show\",\"visibility_hide\":\"hide\",\"width\":\"Auto\"};\\n/* ]]> */</script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/table-of-contents-plus/front.min.js\\'></script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/twentyfifteen-skip-link-focus-fix.min.js\\'></script> <script type=\\'text/javascript\\'>/* <![CDATA[ */\\nvar screenReaderText = {\"expand\":\"<span class=\\\\\"screen-reader-text\\\\\">\\\\u5c55\\\\u5f00\\\\u5b50\\\\u83dc\\\\u5355<\\\\/span>\",\"collapse\":\"<span class=\\\\\"screen-reader-text\\\\\">\\\\u6298\\\\u53e0\\\\u5b50\\\\u83dc\\\\u5355<\\\\/span>\"};\\n/* ]]> */</script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/uploads/siteground-optimizer-assets/twentyfifteen-script.min.js\\'></script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.fancybox.min.js\\'></script> <script type=\\'text/javascript\\'>var fb_timeout, fb_opts={\\'overlayShow\\':true,\\'hideOnOverlayClick\\':true,\\'showCloseButton\\':true,\\'margin\\':20,\\'centerOnScroll\\':false,\\'enableEscapeButton\\':true,\\'autoScale\\':true };\\nif(typeof easy_fancybox_handler===\\'undefined\\'){\\nvar easy_fancybox_handler=function(){\\njQuery(\\'.nofancybox,a.wp-block-file__button,a.pin-it-button,a[href*=\"pinterest.com/pin/create\"],a[href*=\"facebook.com/share\"],a[href*=\"twitter.com/share\"]\\').addClass(\\'nolightbox\\');\\n/* IMG */\\nvar fb_IMG_select=\\'a[href*=\".jpg\"]:not(.nolightbox,li.nolightbox>a),area[href*=\".jpg\"]:not(.nolightbox),a[href*=\".jpeg\"]:not(.nolightbox,li.nolightbox>a),area[href*=\".jpeg\"]:not(.nolightbox),a[href*=\".png\"]:not(.nolightbox,li.nolightbox>a),area[href*=\".png\"]:not(.nolightbox),a[href*=\".webp\"]:not(.nolightbox,li.nolightbox>a),area[href*=\".webp\"]:not(.nolightbox)\\';\\njQuery(fb_IMG_select).addClass(\\'fancybox image\\');\\nvar fb_IMG_sections=jQuery(\\'.gallery,.wp-block-gallery,.tiled-gallery,.wp-block-jetpack-tiled-gallery\\');\\nfb_IMG_sections.each(function(){jQuery(this).find(fb_IMG_select).attr(\\'rel\\',\\'gallery-\\'+fb_IMG_sections.index(this));});\\njQuery(\\'a.fancybox,area.fancybox,li.fancybox a\\').each(function(){jQuery(this).fancybox(jQuery.extend({},fb_opts,{\\'transitionIn\\':\\'elastic\\',\\'easingIn\\':\\'easeOutBack\\',\\'transitionOut\\':\\'elastic\\',\\'easingOut\\':\\'easeInBack\\',\\'opacity\\':false,\\'hideOnContentClick\\':false,\\'titleShow\\':true,\\'titlePosition\\':\\'over\\',\\'titleFromAlt\\':true,\\'showNavArrows\\':true,\\'enableKeyboardNav\\':true,\\'cyclic\\':false}))});};\\njQuery(\\'a.fancybox-close\\').on(\\'click\\',function(e){e.preventDefault();jQuery.fancybox.close()});\\n};\\nvar easy_fancybox_auto=function(){setTimeout(function(){jQuery(\\'#fancybox-auto\\').trigger(\\'click\\')},1000);};\\njQuery(easy_fancybox_handler);jQuery(document).on(\\'post-load\\',easy_fancybox_handler);\\njQuery(easy_fancybox_auto);</script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.easing.min.js\\'></script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-content/plugins/easy-fancybox/js/jquery.mousewheel.min.js\\'></script> <script type=\\'text/javascript\\' src=\\'http://www.crazyant.net/wp-includes/js/wp-embed.min.js\\'></script> </body></html>'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "htmls[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2、解析HTML得到数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_single_html(html):\n",
    "    \"\"\"\n",
    "    解析单个HTML，得到数据\n",
    "    @return list({\"link\", \"title\", [label]})\n",
    "    \"\"\"\n",
    "    soup = BeautifulSoup(html, 'html.parser')\n",
    "    articles = soup.find_all(\"article\")\n",
    "    datas = []\n",
    "    for article in articles:\n",
    "        # 查找超链接\n",
    "        title_node = (\n",
    "            article\n",
    "            .find(\"h2\", class_=\"entry-title\")\n",
    "            .find(\"a\")\n",
    "        )\n",
    "        title = title_node.get_text()\n",
    "        link = title_node[\"href\"]\n",
    "        \n",
    "        # 查找标签列表\n",
    "        tag_nodes = (\n",
    "            article\n",
    "            .find(\"footer\", class_=\"entry-footer\")\n",
    "            .find(\"span\", class_=\"tags-links\")\n",
    "            .find_all(\"a\")\n",
    "        )\n",
    "        tags = [tag_node.get_text() for tag_node in tag_nodes]\n",
    "        datas.append(\n",
    "            {\"title\":title, \"link\":link, \"tags\":tags}\n",
    "        )\n",
    "    return datas\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'link': 'http://www.crazyant.net/2585.html',\n",
      "  'tags': ['python', 'tensorflow', '特征工程'],\n",
      "  'title': 'Tensorflow怎样接收变长列表特征'},\n",
      " {'link': 'http://www.crazyant.net/2583.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas实现数据的合并concat'},\n",
      " {'link': 'http://www.crazyant.net/2574.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas的Index索引有什么用途？'},\n",
      " {'link': 'http://www.crazyant.net/2564.html',\n",
      "  'tags': ['python', '机器学习'],\n",
      "  'title': '机器学习常用数据集大全'},\n",
      " {'link': 'http://www.crazyant.net/2561.html',\n",
      "  'tags': ['数据分析'],\n",
      "  'title': '一个数据科学家的修炼路径'},\n",
      " {'link': 'http://www.crazyant.net/2546.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas的axis参数怎么理解？'},\n",
      " {'link': 'http://www.crazyant.net/2541.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas怎样处理字符串？'},\n",
      " {'link': 'http://www.crazyant.net/2536.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas怎样对数据进行排序？'},\n",
      " {'link': 'http://www.crazyant.net/2534.html',\n",
      "  'tags': ['python', '机器学习'],\n",
      "  'title': 'CTR预估：(标签-权重)列表类特征怎么输入到模型？'},\n",
      " {'link': 'http://www.crazyant.net/2532.html',\n",
      "  'tags': ['pandas', 'python', '数据分析'],\n",
      "  'title': 'Pandas对缺失值的处理'}]\n"
     ]
    }
   ],
   "source": [
    "pprint.pprint(parse_single_html(htmls[0]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 执行所有的HTML页面的解析\n",
    "all_datas = []\n",
    "for html in htmls:\n",
    "    all_datas.extend(parse_single_html(html))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'title': 'Tensorflow怎样接收变长列表特征',\n",
       "  'link': 'http://www.crazyant.net/2585.html',\n",
       "  'tags': ['python', 'tensorflow', '特征工程']},\n",
       " {'title': 'Pandas实现数据的合并concat',\n",
       "  'link': 'http://www.crazyant.net/2583.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas的Index索引有什么用途？',\n",
       "  'link': 'http://www.crazyant.net/2574.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': '机器学习常用数据集大全',\n",
       "  'link': 'http://www.crazyant.net/2564.html',\n",
       "  'tags': ['python', '机器学习']},\n",
       " {'title': '一个数据科学家的修炼路径',\n",
       "  'link': 'http://www.crazyant.net/2561.html',\n",
       "  'tags': ['数据分析']},\n",
       " {'title': 'Pandas的axis参数怎么理解？',\n",
       "  'link': 'http://www.crazyant.net/2546.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas怎样处理字符串？',\n",
       "  'link': 'http://www.crazyant.net/2541.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas怎样对数据进行排序？',\n",
       "  'link': 'http://www.crazyant.net/2536.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'CTR预估：(标签-权重)列表类特征怎么输入到模型？',\n",
       "  'link': 'http://www.crazyant.net/2534.html',\n",
       "  'tags': ['python', '机器学习']},\n",
       " {'title': 'Pandas对缺失值的处理',\n",
       "  'link': 'http://www.crazyant.net/2532.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas的SettingWithCopyWarning报警怎么回事？',\n",
       "  'link': 'http://www.crazyant.net/2528.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': '3分钟Python爬取9000张表情包图片',\n",
       "  'link': 'http://www.crazyant.net/2525.html',\n",
       "  'tags': ['python', '爬虫']},\n",
       " {'title': 'Pandas系列 – 怎样新增数据列？',\n",
       "  'link': 'http://www.crazyant.net/2523.html',\n",
       "  'tags': ['pandas', 'python']},\n",
       " {'title': 'Pandas系列 – 数据统计函数',\n",
       "  'link': 'http://www.crazyant.net/2521.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas怎样根据码表更新ID对应的名称？',\n",
       "  'link': 'http://www.crazyant.net/2517.html',\n",
       "  'tags': ['pandas']},\n",
       " {'title': 'Pandas怎样按条件删除行？',\n",
       "  'link': 'http://www.crazyant.net/2515.html',\n",
       "  'tags': ['pandas', 'python']},\n",
       " {'title': 'Pandas系列-查询数据的5种方法',\n",
       "  'link': 'http://www.crazyant.net/2506.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas系列-DataFrame和Series数据结构',\n",
       "  'link': 'http://www.crazyant.net/2502.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Pandas系列-读取csv/excel/mysql数据',\n",
       "  'link': 'http://www.crazyant.net/2499.html',\n",
       "  'tags': ['pandas', 'python', '数据分析']},\n",
       " {'title': 'Spark使用Java开发遇到的那些类型错误',\n",
       "  'link': 'http://www.crazyant.net/2469.html',\n",
       "  'tags': ['java', 'spark']},\n",
       " {'title': '推荐系统：实现文章相似推荐的简单实例',\n",
       "  'link': 'http://www.crazyant.net/2454.html',\n",
       "  'tags': ['pandas', 'python', 'sklearn', '推荐系统']},\n",
       " {'title': 'Spark使用word2vec训练item2vec实现内容相关推荐',\n",
       "  'link': 'http://www.crazyant.net/2447.html',\n",
       "  'tags': ['item2vec', 'java', '推荐系统']},\n",
       " {'title': 'Pandas中对轴axis=0和axis=1的理解',\n",
       "  'link': 'http://www.crazyant.net/2434.html',\n",
       "  'tags': ['pandas', 'python']},\n",
       " {'title': 'Flask使用Pyecharts在单个页面展示多个图表',\n",
       "  'link': 'http://www.crazyant.net/2419.html',\n",
       "  'tags': ['echarts', 'flask', 'pyecharts', 'python']},\n",
       " {'title': '听樊登的《非暴力沟通》',\n",
       "  'link': 'http://www.crazyant.net/2404.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'Java和Python使用Grpc访问Tensorflow的Serving代码',\n",
       "  'link': 'http://www.crazyant.net/2367.html',\n",
       "  'tags': ['java', 'python', 'tensorflow']},\n",
       " {'title': '推荐系统：怎样实现内容相似推荐',\n",
       "  'link': 'http://www.crazyant.net/2351.html',\n",
       "  'tags': ['推荐系统']},\n",
       " {'title': 'Flask怎样从其他Python文件导入app.route视图函数',\n",
       "  'link': 'http://www.crazyant.net/2343.html',\n",
       "  'tags': ['flask', 'python']},\n",
       " {'title': '我为什么从工程转了算法？',\n",
       "  'link': 'http://www.crazyant.net/2336.html',\n",
       "  'tags': ['大数据', '程序人生', '算法']},\n",
       " {'title': '推荐系统：爱奇艺知识推荐系统架构',\n",
       "  'link': 'http://www.crazyant.net/2324.html',\n",
       "  'tags': ['推荐系统']},\n",
       " {'title': 'Spark使用JAVA编写自定义函数修改DataFrame',\n",
       "  'link': 'http://www.crazyant.net/2303.html',\n",
       "  'tags': ['java', 'mysql', 'spark']},\n",
       " {'title': 'tensorflow怎样输入具有多个值的特征',\n",
       "  'link': 'http://www.crazyant.net/2301.html',\n",
       "  'tags': ['python', 'tensorflow']},\n",
       " {'title': 'Python3用scan和delete命令批量清理redis数据',\n",
       "  'link': 'http://www.crazyant.net/2283.html',\n",
       "  'tags': ['python', 'redis']},\n",
       " {'title': 'CentOS自己编译安装Python3的命令',\n",
       "  'link': 'http://www.crazyant.net/2273.html',\n",
       "  'tags': ['python', 'shell']},\n",
       " {'title': 'PyCharm开发PySpark程序的配置和实例',\n",
       "  'link': 'http://www.crazyant.net/2261.html',\n",
       "  'tags': ['pyspark', 'python', 'spark']},\n",
       " {'title': 'Spark数据倾斜解决方法',\n",
       "  'link': 'http://www.crazyant.net/2231.html',\n",
       "  'tags': ['spark']},\n",
       " {'title': '读书笔记 – 数据驱动从方法到实践',\n",
       "  'link': 'http://www.crazyant.net/2194.html',\n",
       "  'tags': ['人工智能', '大数据', '数据驱动']},\n",
       " {'title': '使用PaddlePaddle搭建卷积网络做文本数据分类',\n",
       "  'link': 'http://www.crazyant.net/2177.html',\n",
       "  'tags': ['paddlepaddle', '机器学习']},\n",
       " {'title': '使用Kmeans对Word2vec的输出做聚类',\n",
       "  'link': 'http://www.crazyant.net/2167.html',\n",
       "  'tags': ['机器学习', '聚类']},\n",
       " {'title': 'Hive实现返回MAP的UDF',\n",
       "  'link': 'http://www.crazyant.net/2160.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'Python高级编程技巧',\n",
       "  'link': 'http://www.crazyant.net/2150.html',\n",
       "  'tags': ['python']},\n",
       " {'title': '快速找到Tomcat中最耗CPU的线程',\n",
       "  'link': 'http://www.crazyant.net/2145.html',\n",
       "  'tags': ['java', 'jvm', 'tomcat']},\n",
       " {'title': 'Java线程池ThreadPoolExecutor详解',\n",
       "  'link': 'http://www.crazyant.net/2124.html',\n",
       "  'tags': ['java']},\n",
       " {'title': 'Zookeeper并不保证读取的是最新数据',\n",
       "  'link': 'http://www.crazyant.net/2120.html',\n",
       "  'tags': ['zookeeper']},\n",
       " {'title': 'Mybatis源码解读-初始化过程详解',\n",
       "  'link': 'http://www.crazyant.net/2089.html',\n",
       "  'tags': ['mybatis']},\n",
       " {'title': '怎样借助Python爬虫给宝宝起个好名字',\n",
       "  'link': 'http://www.crazyant.net/2076.html',\n",
       "  'tags': ['python', '爬虫']},\n",
       " {'title': 'Mybatis源码解读-设计模式总结',\n",
       "  'link': 'http://www.crazyant.net/2022.html',\n",
       "  'tags': ['mybatis', '设计模式']},\n",
       " {'title': '打工者心态、主人公意识、个人公司品牌',\n",
       "  'link': 'http://www.crazyant.net/2012.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'Tomcat内存分析相关方法(jmap和mat)',\n",
       "  'link': 'http://www.crazyant.net/1980.html',\n",
       "  'tags': ['java', 'tomcat']},\n",
       " {'title': '如此重要但是经常被忽视的代码架构！',\n",
       "  'link': 'http://www.crazyant.net/1973.html',\n",
       "  'tags': ['技术架构']},\n",
       " {'title': 'Stay hungry, Stay foolish',\n",
       "  'link': 'http://www.crazyant.net/1964.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'MAC挂载NTFS移动硬盘进行读写操作',\n",
       "  'link': 'http://www.crazyant.net/1961.html',\n",
       "  'tags': ['mac']},\n",
       " {'title': '工程师的月亮和六便士',\n",
       "  'link': 'http://www.crazyant.net/1957.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'MAC环境堪比Visio的画图神器',\n",
       "  'link': 'http://www.crazyant.net/1946.html',\n",
       "  'tags': ['mac']},\n",
       " {'title': 'Log4j将不同Package的日志输出到不同的文件的方法',\n",
       "  'link': 'http://www.crazyant.net/1931.html',\n",
       "  'tags': ['java', 'log4j']},\n",
       " {'title': '数据处理中提升性能的方法-引入并发但是避免同步',\n",
       "  'link': 'http://www.crazyant.net/1922.html',\n",
       "  'tags': ['java', 'php', 'python', 'shell', '大数据', '数据处理']},\n",
       " {'title': 'MySQL导入导出数据时遇到Tab符号和换行符号怎么办？',\n",
       "  'link': 'http://www.crazyant.net/1901.html',\n",
       "  'tags': ['mysql', 'python']},\n",
       " {'title': '使用PHPUnit编写PHP单元测试的方法',\n",
       "  'link': 'http://www.crazyant.net/1898.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Bash Shell怎样检查文件是否存在？',\n",
       "  'link': 'http://www.crazyant.net/1895.html',\n",
       "  'tags': ['linux', 'shell']},\n",
       " {'title': 'Python使用unittest实现简单的单元测试实例',\n",
       "  'link': 'http://www.crazyant.net/1890.html',\n",
       "  'tags': ['python', '单测']},\n",
       " {'title': '将Maven工程打包成可执行JAR包的方法',\n",
       "  'link': 'http://www.crazyant.net/1886.html',\n",
       "  'tags': ['java', 'maven']},\n",
       " {'title': 'Java线程死亡的几种情况',\n",
       "  'link': 'http://www.crazyant.net/1861.html',\n",
       "  'tags': ['java']},\n",
       " {'title': '通过JVM堆栈分析出现大量线程的原因',\n",
       "  'link': 'http://www.crazyant.net/1858.html',\n",
       "  'tags': ['java', 'jvm']},\n",
       " {'title': '想要加悲观锁可是数据行还不存在怎么办？',\n",
       "  'link': 'http://www.crazyant.net/1835.html',\n",
       "  'tags': ['java', '并发控制']},\n",
       " {'title': 'Java堆溢出OutOfMemoryError之代码实例和原因分析',\n",
       "  'link': 'http://www.crazyant.net/1810.html',\n",
       "  'tags': ['java', 'jvm']},\n",
       " {'title': 'Python中文转拼音代码(支持全拼和首字母缩写)',\n",
       "  'link': 'http://www.crazyant.net/1789.html',\n",
       "  'tags': ['python']},\n",
       " {'title': '使用javap命令或者eclipse的Bytecode visualizer插件阅读java字节码文件',\n",
       "  'link': 'http://www.crazyant.net/1784.html',\n",
       "  'tags': ['java']},\n",
       " {'title': 'Java怎样单测void类型的方法？',\n",
       "  'link': 'http://www.crazyant.net/1782.html',\n",
       "  'tags': ['java']},\n",
       " {'title': '《解忧杂货店》- 解答心中已经有结论的疑问',\n",
       "  'link': 'http://www.crazyant.net/1777.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '有了事务为什么还需要乐观锁和悲观锁',\n",
       "  'link': 'http://www.crazyant.net/1763.html',\n",
       "  'tags': ['mysql', '数据库']},\n",
       " {'title': '数据库并发控制机制的理解',\n",
       "  'link': 'http://www.crazyant.net/1741.html',\n",
       "  'tags': ['事务']},\n",
       " {'title': '读书破万卷，代码如有神',\n",
       "  'link': 'http://www.crazyant.net/1735.html',\n",
       "  'tags': ['java', '阅读']},\n",
       " {'title': '《Spring in action》3rd中SpringPizza项目的运行方法',\n",
       "  'link': 'http://www.crazyant.net/1722.html',\n",
       "  'tags': ['spring']},\n",
       " {'title': '做设计就像创世界',\n",
       "  'link': 'http://www.crazyant.net/1712.html',\n",
       "  'tags': ['程序人生', '设计']},\n",
       " {'title': 'Python使用list字段模式或者dict字段模式读取文件的方法',\n",
       "  'link': 'http://www.crazyant.net/1707.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'Java怎样创建两个KEY（key-pair）的MAP',\n",
       "  'link': 'http://www.crazyant.net/1703.html',\n",
       "  'tags': ['java']},\n",
       " {'title': '《超体》中的哲学',\n",
       "  'link': 'http://www.crazyant.net/1697.html',\n",
       "  'tags': ['思考']},\n",
       " {'title': 'Java枚举类型代码的二逼写法和艺术写法',\n",
       "  'link': 'http://www.crazyant.net/1689.html',\n",
       "  'tags': ['java']},\n",
       " {'title': 'Python操作MySQL视频教程',\n",
       "  'link': 'http://www.crazyant.net/1664.html',\n",
       "  'tags': ['mysql', 'python', '视频']},\n",
       " {'title': 'Hive开发经验问答式总结',\n",
       "  'link': 'http://www.crazyant.net/1625.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': '将普通Maven Spring项目转换成Web项目的方法',\n",
       "  'link': 'http://www.crazyant.net/1607.html',\n",
       "  'tags': ['java', 'maven']},\n",
       " {'title': 'Hive取非Group by字段数据的方法',\n",
       "  'link': 'http://www.crazyant.net/1600.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'MySQL执行Select语句将结果导出到文件的方法',\n",
       "  'link': 'http://www.crazyant.net/1587.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'Hive的left join、left outer join和left semi join三者的区别',\n",
       "  'link': 'http://www.crazyant.net/1470.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': '将网站的创意变成钱的过程',\n",
       "  'link': 'http://www.crazyant.net/1465.html',\n",
       "  'tags': ['站长']},\n",
       " {'title': '从产品和技术的对比想到的',\n",
       "  'link': 'http://www.crazyant.net/1459.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'Hive中Order by和Sort by的区别是什么?',\n",
       "  'link': 'http://www.crazyant.net/1456.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': '向Hive程序传递变量的三种方法',\n",
       "  'link': 'http://www.crazyant.net/1451.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': '把HIVE程序优化30倍的经验',\n",
       "  'link': 'http://www.crazyant.net/1440.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'Hive使用TRANSFORM运行Python脚本总结',\n",
       "  'link': 'http://www.crazyant.net/1437.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'MySQL 查看数据库中每个表占用的空间大小',\n",
       "  'link': 'http://www.crazyant.net/1428.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'Java使用lombok自动生成getter和setter方法',\n",
       "  'link': 'http://www.crazyant.net/1426.html',\n",
       "  'tags': ['java']},\n",
       " {'title': 'MapReduce文件切分个数计算方法',\n",
       "  'link': 'http://www.crazyant.net/1423.html',\n",
       "  'tags': ['hadoop']},\n",
       " {'title': '《大数据时代》是一部科幻小说',\n",
       "  'link': 'http://www.crazyant.net/1413.html',\n",
       "  'tags': ['hadoop']},\n",
       " {'title': '[转]Hive中对group结果分组取limit N值的实现',\n",
       "  'link': 'http://www.crazyant.net/1409.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'HIVE的几个使用技巧',\n",
       "  'link': 'http://www.crazyant.net/1404.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'Python批量重命名文件的方法',\n",
       "  'link': 'http://www.crazyant.net/1397.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'Python内置函数map、reduce、filter在文本处理中的应用',\n",
       "  'link': 'http://www.crazyant.net/1390.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'chrome自动刷新网页插件：Auto Refresh Plus',\n",
       "  'link': 'http://www.crazyant.net/1372.html',\n",
       "  'tags': ['chrome']},\n",
       " {'title': 'MySQL数据导入导出实例教程手册',\n",
       "  'link': 'http://www.crazyant.net/1355.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'MySQL一条语句更新多个表的方法',\n",
       "  'link': 'http://www.crazyant.net/1345.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'mysql根据A表更新B表的方法',\n",
       "  'link': 'http://www.crazyant.net/1337.html',\n",
       "  'tags': ['mysql', 'python']},\n",
       " {'title': '[织梦DEDE迁移]读取织梦MySQL生成所有文章链接',\n",
       "  'link': 'http://www.crazyant.net/1326.html',\n",
       "  'tags': ['织梦']},\n",
       " {'title': 'Python访问MySQL封装的常用类',\n",
       "  'link': 'http://www.crazyant.net/1321.html',\n",
       "  'tags': ['mysql', 'python']},\n",
       " {'title': 'python执行shell的两种方法',\n",
       "  'link': 'http://www.crazyant.net/1319.html',\n",
       "  'tags': ['python', 'shell']},\n",
       " {'title': 'Python封装的常用日期函数',\n",
       "  'link': 'http://www.crazyant.net/1309.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'python子类调用父类的方法',\n",
       "  'link': 'http://www.crazyant.net/1303.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'wordpress按层级方式显示分类链接的方法',\n",
       "  'link': 'http://www.crazyant.net/1297.html',\n",
       "  'tags': ['wordpress']},\n",
       " {'title': 'Firefox数据采集插件大全',\n",
       "  'link': 'http://www.crazyant.net/1292.html',\n",
       "  'tags': ['数据采集', '爬虫']},\n",
       " {'title': 'Python生成文件md5校验值函数',\n",
       "  'link': 'http://www.crazyant.net/1216.html',\n",
       "  'tags': ['python']},\n",
       " {'title': '网站从织梦DEDECMS迁移到WordPress过程以及URL重定向方法',\n",
       "  'link': 'http://www.crazyant.net/1214.html',\n",
       "  'tags': ['织梦']},\n",
       " {'title': 'shell/hadoop/hive一些有用命令收集',\n",
       "  'link': 'http://www.crazyant.net/1209.html',\n",
       "  'tags': ['hadoop', 'hive', 'mysql', 'shell']},\n",
       " {'title': 'Hive开发中使用变量的两种方法',\n",
       "  'link': 'http://www.crazyant.net/1203.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'hive从查询中获取数据插入到表或动态分区',\n",
       "  'link': 'http://www.crazyant.net/1197.html',\n",
       "  'tags': ['hive']},\n",
       " {'title': 'Hive元数据存于mysql中文乱码解决',\n",
       "  'link': 'http://www.crazyant.net/1193.html',\n",
       "  'tags': ['hive', 'mysql']},\n",
       " {'title': '为eclipse安装python、shell开发环境和SVN插件',\n",
       "  'link': 'http://www.crazyant.net/1185.html',\n",
       "  'tags': ['python', 'shell']},\n",
       " {'title': 'hadoop第一个程序WordCount.java的编译运行过程',\n",
       "  'link': 'http://www.crazyant.net/1144.html',\n",
       "  'tags': ['hadoop']},\n",
       " {'title': 'MYSQL向数据表插入默认字段值的方法',\n",
       "  'link': 'http://www.crazyant.net/1129.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'Hadoop-Streaming实战经验及问题解决方法总结',\n",
       "  'link': 'http://www.crazyant.net/1122.html',\n",
       "  'tags': ['hadoop']},\n",
       " {'title': 'Hadoop之使用python实现数据集合间join操作',\n",
       "  'link': 'http://www.crazyant.net/1112.html',\n",
       "  'tags': ['hadoop', 'python']},\n",
       " {'title': 'Rational Rose根据Java代码自动生成类图（教程和错误解决）',\n",
       "  'link': 'http://www.crazyant.net/1094.html',\n",
       "  'tags': ['java']},\n",
       " {'title': 'MathType(数学公式编辑器) 汉化绿色版V6.7下载',\n",
       "  'link': 'http://www.crazyant.net/1088.html',\n",
       "  'tags': ['mathtype']},\n",
       " {'title': 'JSP使用JNA调用DLL函数遇到的几个问题',\n",
       "  'link': 'http://www.crazyant.net/1072.html',\n",
       "  'tags': ['java']},\n",
       " {'title': '读《疯狂的站长》- 回顾反思我的个人站长路',\n",
       "  'link': 'http://www.crazyant.net/1066.html',\n",
       "  'tags': ['站长']},\n",
       " {'title': '给计算机专业求职的同学推荐几本书',\n",
       "  'link': 'http://www.crazyant.net/1064.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'MySQL数据库存储过程教程',\n",
       "  'link': 'http://www.crazyant.net/1061.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'Magento获取指定分类下的所有子分类信息',\n",
       "  'link': 'http://www.crazyant.net/1057.html',\n",
       "  'tags': ['magento', 'php']},\n",
       " {'title': 'WIN7使用VisualSVN建立SVN服务器',\n",
       "  'link': 'http://www.crazyant.net/1055.html',\n",
       "  'tags': ['svn', 'win7']},\n",
       " {'title': '织梦DEDECMS简洁蓝色模板免费下载（资讯文章类）',\n",
       "  'link': 'http://www.crazyant.net/1044.html',\n",
       "  'tags': ['织梦']},\n",
       " {'title': 'Django基本命令最全收集',\n",
       "  'link': 'http://www.crazyant.net/1036.html',\n",
       "  'tags': ['django', 'python']},\n",
       " {'title': '2012年百度、腾讯、微软、奇虎360、人人、去哪网找工作经历总结',\n",
       "  'link': 'http://www.crazyant.net/1030.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'PHP对数组的高级遍历和操作处理方法',\n",
       "  'link': 'http://www.crazyant.net/1022.html',\n",
       "  'tags': ['php']},\n",
       " {'title': '使用PHP连接、操纵Memcached的原理和教程',\n",
       "  'link': 'http://www.crazyant.net/1014.html',\n",
       "  'tags': ['memcached', 'php', '数据库']},\n",
       " {'title': 'Django关于站点管理Admin Site的常见问题解决方法',\n",
       "  'link': 'http://www.crazyant.net/1005.html',\n",
       "  'tags': ['django', 'python']},\n",
       " {'title': '对Django框架架构和Request/Response处理流程的分析',\n",
       "  'link': 'http://www.crazyant.net/1001.html',\n",
       "  'tags': ['django', 'python']},\n",
       " {'title': 'PHP开发者最好的学习资源收集',\n",
       "  'link': 'http://www.crazyant.net/970.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Ubuntu10.10 Server+Nginx+Django+Postgresql安装步骤',\n",
       "  'link': 'http://www.crazyant.net/955.html',\n",
       "  'tags': ['django', 'ngnix', 'ubuntu']},\n",
       " {'title': 'PHP和MySQL处理树状、分级、无限分类、分层数据的方法',\n",
       "  'link': 'http://www.crazyant.net/930.html',\n",
       "  'tags': ['mysql', 'php']},\n",
       " {'title': 'PHP创建和解析JSON数据的方法',\n",
       "  'link': 'http://www.crazyant.net/920.html',\n",
       "  'tags': ['json', 'php']},\n",
       " {'title': '程序员做开发，前台、后台、测试哪个累？',\n",
       "  'link': 'http://www.crazyant.net/914.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'PHP的验证码实现（w3schools推荐）',\n",
       "  'link': 'http://www.crazyant.net/912.html',\n",
       "  'tags': ['php', '验证码']},\n",
       " {'title': '国外10个非常有趣的PHP博客',\n",
       "  'link': 'http://www.crazyant.net/897.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP读写Word文件的最佳类库收集',\n",
       "  'link': 'http://www.crazyant.net/886.html',\n",
       "  'tags': ['php', 'word']},\n",
       " {'title': '2012年度读写Excel文件的最佳PHP类库收集',\n",
       "  'link': 'http://www.crazyant.net/874.html',\n",
       "  'tags': ['excel', 'php']},\n",
       " {'title': '使用Google搭建自己的SVN或Git或Mercurial代码服务器之完美教程',\n",
       "  'link': 'http://www.crazyant.net/855.html',\n",
       "  'tags': ['git', 'svn']},\n",
       " {'title': 'PHP远程操纵WordPress的方法(流程剖析）',\n",
       "  'link': 'http://www.crazyant.net/821.html',\n",
       "  'tags': ['php', 'wordpress']},\n",
       " {'title': 'Python模拟登陆新浪微博并实现投票功能',\n",
       "  'link': 'http://www.crazyant.net/818.html',\n",
       "  'tags': ['python', '爬虫']},\n",
       " {'title': 'Django中定制自己的User和Group管理模块（类似对admin的二次开发）',\n",
       "  'link': 'http://www.crazyant.net/814.html',\n",
       "  'tags': ['django', 'python']},\n",
       " {'title': 'django1.4设置模板路径和CSS,JS,image等路径的方法',\n",
       "  'link': 'http://www.crazyant.net/811.html',\n",
       "  'tags': ['django', 'python']},\n",
       " {'title': '珠玑：在仔细研究数据的基础上得出程序的结构',\n",
       "  'link': 'http://www.crazyant.net/808.html',\n",
       "  'tags': ['算法']},\n",
       " {'title': 'Python使用cookielib和urllib2模拟登陆新浪微博并抓取数据',\n",
       "  'link': 'http://www.crazyant.net/796.html',\n",
       "  'tags': ['python', '模拟登陆', '爬虫']},\n",
       " {'title': '《SEO实战密码》高清电子版PDF下载地址（SEO学习必备）',\n",
       "  'link': 'http://www.crazyant.net/790.html',\n",
       "  'tags': ['seo']},\n",
       " {'title': '重装Win7后恢复和找回Ubuntu启动项',\n",
       "  'link': 'http://www.crazyant.net/781.html',\n",
       "  'tags': ['linux', 'ubuntu', 'win7']},\n",
       " {'title': 'putty连接linux出现中文乱码的解决方法',\n",
       "  'link': 'http://www.crazyant.net/756.html',\n",
       "  'tags': ['python', 'ubuntu']},\n",
       " {'title': 'Ubuntu 安装 PostgreSQL 和 python-psycopg2基础教程（以及错误解决）',\n",
       "  'link': 'http://www.crazyant.net/754.html',\n",
       "  'tags': ['ubuntu']},\n",
       " {'title': 'eclipse远程发布代码的方法（SSH自动同步）',\n",
       "  'link': 'http://www.crazyant.net/749.html',\n",
       "  'tags': ['eclipse', 'python']},\n",
       " {'title': 'python在linux下安装方法（解决旧版本冲突）',\n",
       "  'link': 'http://www.crazyant.net/747.html',\n",
       "  'tags': ['linux', 'python']},\n",
       " {'title': '2012年度PHP最佳类库收集',\n",
       "  'link': 'http://www.crazyant.net/740.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'php判断远程文件或网站是否能打开',\n",
       "  'link': 'http://www.crazyant.net/734.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP数据采集之使用CURL、DOMDocument和DOMXPath',\n",
       "  'link': 'http://www.crazyant.net/731.html',\n",
       "  'tags': ['php', '爬虫']},\n",
       " {'title': 'Python关于apply的知识',\n",
       "  'link': 'http://www.crazyant.net/724.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'Python知识之什么是*args和**kwargs？',\n",
       "  'link': 'http://www.crazyant.net/722.html',\n",
       "  'tags': ['python']},\n",
       " {'title': 'PHP魔法方法之__sleep()方法和__wakeup()方法',\n",
       "  'link': 'http://www.crazyant.net/717.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Python中的操作符重载',\n",
       "  'link': 'http://www.crazyant.net/712.html',\n",
       "  'tags': ['python']},\n",
       " {'title': '数据采集简单示例：采集爱帮网电话号码',\n",
       "  'link': 'http://www.crazyant.net/707.html',\n",
       "  'tags': ['php', '爬虫']},\n",
       " {'title': '数据采集技术之在Python中Libxml模块安装与使用XPath',\n",
       "  'link': 'http://www.crazyant.net/700.html',\n",
       "  'tags': ['python', '爬虫']},\n",
       " {'title': 'Python操作Mysql实例代码教程（查询手册）',\n",
       "  'link': 'http://www.crazyant.net/686.html',\n",
       "  'tags': ['mysql', 'python', '数据库']},\n",
       " {'title': 'MySQL-python Windows下EXE安装文件下载',\n",
       "  'link': 'http://www.crazyant.net/678.html',\n",
       "  'tags': ['python']},\n",
       " {'title': '数据采集必备知识-php计划任务的实现',\n",
       "  'link': 'http://www.crazyant.net/675.html',\n",
       "  'tags': ['php', '爬虫']},\n",
       " {'title': '个人博客SEO第一步-提交自己的网站',\n",
       "  'link': 'http://www.crazyant.net/658.html',\n",
       "  'tags': ['seo']},\n",
       " {'title': '情理之中又意料之外的超强减肥方法',\n",
       "  'link': 'http://www.crazyant.net/655.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '新浪微博的mid转换成base62格式的PHP函数',\n",
       "  'link': 'http://www.crazyant.net/647.html',\n",
       "  'tags': ['php', '爬虫']},\n",
       " {'title': 'windows下PHP环境（apache,PHP,Mysql）详细配置方法',\n",
       "  'link': 'http://www.crazyant.net/639.html',\n",
       "  'tags': ['apache', 'mysql', 'php', 'windows']},\n",
       " {'title': 'PHP100视频教程2012版解压密码',\n",
       "  'link': 'http://www.crazyant.net/636.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Redis+Mysql模式和内存+硬盘模式的异同',\n",
       "  'link': 'http://www.crazyant.net/629.html',\n",
       "  'tags': ['mysql', 'redis']},\n",
       " {'title': 'Redis详细完整教程-windows下的安装、测试(php+redis+mysql)',\n",
       "  'link': 'http://www.crazyant.net/611.html',\n",
       "  'tags': ['mysql', 'php', 'redis']},\n",
       " {'title': 'Windows Live Writer快捷方式（打开服务器文档等）',\n",
       "  'link': 'http://www.crazyant.net/604.html',\n",
       "  'tags': ['wordpress']},\n",
       " {'title': 'PHP字符串函数、知识要点总结',\n",
       "  'link': 'http://www.crazyant.net/600.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP数组使用、特性、函数的总结',\n",
       "  'link': 'http://www.crazyant.net/591.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP对文件的操作总结',\n",
       "  'link': 'http://www.crazyant.net/581.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP操作符可变变量测试变量等总结',\n",
       "  'link': 'http://www.crazyant.net/576.html',\n",
       "  'tags': ['php']},\n",
       " {'title': '有句话说的非常好',\n",
       "  'link': 'http://www.crazyant.net/548.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '[C++]数据结构之堆-上滤下滤以及用于排序',\n",
       "  'link': 'http://www.crazyant.net/545.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': 'C++拆分字符串代码（实现split）',\n",
       "  'link': 'http://www.crazyant.net/540.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': '看完这20部电影相当于学了经济学（投资理财必看电影）',\n",
       "  'link': 'http://www.crazyant.net/515.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'Adobe Dreamweaver CS6官方简体中文版安装+破解过程',\n",
       "  'link': 'http://www.crazyant.net/512.html',\n",
       "  'tags': ['dreamweaver']},\n",
       " {'title': 'win7系统笔记本设置成虚拟WiFi热点（即“无线路由器”）',\n",
       "  'link': 'http://www.crazyant.net/505.html',\n",
       "  'tags': ['win7']},\n",
       " {'title': '按大小拆分超大文件的方法（本文测试了一个62G的文件）',\n",
       "  'link': 'http://www.crazyant.net/502.html',\n",
       "  'tags': ['linux', 'shell']},\n",
       " {'title': 'Dedecms备份还原网站有效方法',\n",
       "  'link': 'http://www.crazyant.net/499.html',\n",
       "  'tags': ['织梦']},\n",
       " {'title': 'WIN7下硬盘安装Ubuntu 11.10系统成功',\n",
       "  'link': 'http://www.crazyant.net/495.html',\n",
       "  'tags': ['ubuntu', 'win7']},\n",
       " {'title': '程序员找工作网啊站-计算机专业学生必看',\n",
       "  'link': 'http://www.crazyant.net/491.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '被深深鄙视的2012找暑期实习，哥很伤心',\n",
       "  'link': 'http://www.crazyant.net/485.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'phpmyadmin远程连接mysql数据库的方法',\n",
       "  'link': 'http://www.crazyant.net/483.html',\n",
       "  'tags': ['mysql', 'php']},\n",
       " {'title': 'mysql用命令行链接远程主机的方法',\n",
       "  'link': 'http://www.crazyant.net/480.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': 'c/c++批量向mysql插入数据',\n",
       "  'link': 'http://www.crazyant.net/478.html',\n",
       "  'tags': ['c++', 'mysql']},\n",
       " {'title': '[站长]推荐一个网页分享按钮条插件JiaThis',\n",
       "  'link': 'http://www.crazyant.net/471.html',\n",
       "  'tags': ['站长']},\n",
       " {'title': '[网址]在线转换编码-BASE64_URLENCODE等',\n",
       "  'link': 'http://www.crazyant.net/466.html',\n",
       "  'tags': ['算法']},\n",
       " {'title': '[PHP]发送邮件方法介绍和代码示例',\n",
       "  'link': 'http://www.crazyant.net/454.html',\n",
       "  'tags': ['email', 'php']},\n",
       " {'title': '[C++]win32输出当前系统时间函数，可用以程序计时',\n",
       "  'link': 'http://www.crazyant.net/449.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': '[转]office – word2010每次打开弹出配置框解决方法',\n",
       "  'link': 'http://www.crazyant.net/444.html',\n",
       "  'tags': ['office', 'word']},\n",
       " {'title': 'Linux下GCC和Makefile实例（从GCC的编译到Makefile的引入）',\n",
       "  'link': 'http://www.crazyant.net/414.html',\n",
       "  'tags': ['linux']},\n",
       " {'title': '让QT支持中文的方法',\n",
       "  'link': 'http://www.crazyant.net/410.html',\n",
       "  'tags': ['qt']},\n",
       " {'title': 'QT-creater一个非常棒的教程',\n",
       "  'link': 'http://www.crazyant.net/408.html',\n",
       "  'tags': ['qt']},\n",
       " {'title': '玩大灾变出现“igxprd32显示驱动程序已经停止正常工作”解决方法',\n",
       "  'link': 'http://www.crazyant.net/402.html',\n",
       "  'tags': ['魔兽世界']},\n",
       " {'title': '在GATE工具中使用自己的XSD模式进行语义标注',\n",
       "  'link': 'http://www.crazyant.net/391.html',\n",
       "  'tags': ['gate']},\n",
       " {'title': 'C++ Primer 4th：第九章 《顺序容器》学习心得',\n",
       "  'link': 'http://www.crazyant.net/381.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': 'win7下快速硬盘安装ghost xp的方法',\n",
       "  'link': 'http://www.crazyant.net/259.html',\n",
       "  'tags': ['win7']},\n",
       " {'title': 'C++实现字符串与数字的连接',\n",
       "  'link': 'http://www.crazyant.net/254.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': 'c++字符集之间转换(UTF-8,UNICODE,Gb2312)',\n",
       "  'link': 'http://www.crazyant.net/251.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': '数据采集利器-PHP用DOM方式处理HTML之《Simple HTML DOM》',\n",
       "  'link': 'http://www.crazyant.net/245.html',\n",
       "  'tags': ['php', '爬虫']},\n",
       " {'title': 'C++数组类型学习笔记',\n",
       "  'link': 'http://www.crazyant.net/236.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': 'C++标准库string类型学习笔记',\n",
       "  'link': 'http://www.crazyant.net/234.html',\n",
       "  'tags': ['c++']},\n",
       " {'title': '9个高质量图标的最佳搜索引擎',\n",
       "  'link': 'http://www.crazyant.net/224.html',\n",
       "  'tags': ['设计']},\n",
       " {'title': 'navicat-MySQL前台管理工具利器',\n",
       "  'link': 'http://www.crazyant.net/219.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': '17个非常有用的PHP类和库',\n",
       "  'link': 'http://www.crazyant.net/215.html',\n",
       "  'tags': ['php', '类库']},\n",
       " {'title': '使用DEDE的全国地区分类导入到其它CMS',\n",
       "  'link': 'http://www.crazyant.net/210.html',\n",
       "  'tags': ['php']},\n",
       " {'title': '对自己将来的一些思考',\n",
       "  'link': 'http://www.crazyant.net/206.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '推荐一个可以用u盘安装fedora,ubuntu等Linux的工具',\n",
       "  'link': 'http://www.crazyant.net/204.html',\n",
       "  'tags': ['linux']},\n",
       " {'title': 'ubuntu官方live cd和dvd下载地址',\n",
       "  'link': 'http://www.crazyant.net/198.html',\n",
       "  'tags': ['ubuntu']},\n",
       " {'title': '非常好用的一款磁盘管理工具Acronis Disk Director Suite',\n",
       "  'link': 'http://www.crazyant.net/195.html',\n",
       "  'tags': ['操作系统']},\n",
       " {'title': 'jQuery圆角工具jQuery Corner',\n",
       "  'link': 'http://www.crazyant.net/191.html',\n",
       "  'tags': ['jquery']},\n",
       " {'title': '筛选出来的常用jQuery幻灯片插件',\n",
       "  'link': 'http://www.crazyant.net/179.html',\n",
       "  'tags': ['jquery']},\n",
       " {'title': 'PHP-浏览器参数防注入检测函数',\n",
       "  'link': 'http://www.crazyant.net/161.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP从checkbox取值',\n",
       "  'link': 'http://www.crazyant.net/159.html',\n",
       "  'tags': ['php']},\n",
       " {'title': '毕业设计出现的一个严重错误—-文件不能相互引用',\n",
       "  'link': 'http://www.crazyant.net/157.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': 'php函数-计算两个日期相差多少天',\n",
       "  'link': 'http://www.crazyant.net/155.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Apache禁止目录访问方法介绍',\n",
       "  'link': 'http://www.crazyant.net/153.html',\n",
       "  'tags': ['apache']},\n",
       " {'title': 'PHP操作EXCEL相关',\n",
       "  'link': 'http://www.crazyant.net/151.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'phpexcel-自己写的几个非常好用的函数',\n",
       "  'link': 'http://www.crazyant.net/149.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Fckeditor-动态增减按钮的方法',\n",
       "  'link': 'http://www.crazyant.net/146.html',\n",
       "  'tags': ['fckeditor']},\n",
       " {'title': 'PHP验证码-类',\n",
       "  'link': 'http://www.crazyant.net/144.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'mysql-修改root密码的方法',\n",
       "  'link': 'http://www.crazyant.net/142.html',\n",
       "  'tags': ['mysql']},\n",
       " {'title': '获取服务器传来的数据-必须用JS去空格',\n",
       "  'link': 'http://www.crazyant.net/138.html',\n",
       "  'tags': ['javascript']},\n",
       " {'title': 'php实现当前用户在线人数',\n",
       "  'link': 'http://www.crazyant.net/136.html',\n",
       "  'tags': ['php']},\n",
       " {'title': '网上选课系统-进度',\n",
       "  'link': 'http://www.crazyant.net/134.html',\n",
       "  'tags': ['程序人生']},\n",
       " {'title': '屏幕取色工具',\n",
       "  'link': 'http://www.crazyant.net/132.html',\n",
       "  'tags': ['工具软件']},\n",
       " {'title': 'PHP-非常好用的文件操作类',\n",
       "  'link': 'http://www.crazyant.net/130.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP删除无限分类并同时删除它下面的所有子分类的方法',\n",
       "  'link': 'http://www.crazyant.net/128.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'PHP获取IP的多种方式解析',\n",
       "  'link': 'http://www.crazyant.net/126.html',\n",
       "  'tags': ['php']},\n",
       " {'title': 'Javascript trim()函数实现',\n",
       "  'link': 'http://www.crazyant.net/124.html',\n",
       "  'tags': ['javascript']}]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_datas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "240"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(all_datas)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3、将结果输出存储\n",
    "1. MySQL\n",
    "2. 本地JSON文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"all_article_links.json\", \"w\") as fout:\n",
    "    for data in all_datas:\n",
    "        fout.write(json.dumps(data, ensure_ascii=False)+\"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
